16 research outputs found
Data Mining in Internet of Things Systems: A Literature Review
The Internet of Things (IoT) and cloud technologies have been the main focus of recent research, allowing for the accumulation of a vast amount of data generated from this diverse environment. These data include without any doubt priceless knowledge if could correctly discovered and correlated in an efficient manner. Data mining algorithms can be applied to the Internet of Things (IoT) to extract hidden information from the massive amounts of data that are generated by IoT and are thought to have high business value. In this paper, the most important data mining approaches covering classification, clustering, association analysis, time series analysis, and outlier analysis from the knowledge will be covered. Additionally, a survey of recent work in in this direction is included. Another significant challenges in the field are collecting, storing, and managing the large number of devices along with their associated features. In this paper, a deep look on the data mining for the IoT platforms will be given concentrating on real applications found in the literatur
WEB-BASED DUPLICATE RECORDS DETECTION WITH ARABIC LANGUAGE ENHANCEMENT
Sharing data between organizations has growing importance in many data mining projects. Data from various heterogeneous sources often has to be linked and aggregated in order to improve data quality. The importance of data accuracy and quality has increased with the explosion of data size. The first step to ensure the data accuracy is to make sure that each real world object is represented once and only once in a certain dataset which called Duplicate Record Detection (DRD). These data inaccuracy problems exist due to due to several factors including spelling, typographical and pronunciation variation, dialects and special vowel and consonant distinction and other linguistic characteristics especially with non-Latin languages like Arabic. In this paper, an English/Arabic enabled web-based framework is designed and implemented which considers the user interaction to add new rules, enrich the dictionary and evaluate results is an important step to improve system’s behavior. The proposed framework allows the processing on both single language dataset and bi-lingual dataset. The proposed framework is implemented and verified empirically in several case studies. The comparison results showed that the proposed system has substantial improvements compared to known tools
TCE at Qur'an QA 2022: Arabic Language Question Answering Over Holy Qur'an Using a Post-Processed Ensemble of BERT-based Models
In recent years, we witnessed great progress in different tasks of natural
language understanding using machine learning. Question answering is one of
these tasks which is used by search engines and social media platforms for
improved user experience. Arabic is the language of the Holy Qur'an; the sacred
text for 1.8 billion people across the world. Arabic is a challenging language
for Natural Language Processing (NLP) due to its complex structures. In this
article, we describe our attempts at OSACT5 Qur'an QA 2022 Shared Task, which
is a question answering challenge on the Holy Qur'an in Arabic. We propose an
ensemble learning model based on Arabic variants of BERT models. In addition,
we perform post-processing to enhance the model predictions. Our system
achieves a Partial Reciprocal Rank (pRR) score of 56.6% on the official test
set.Comment: OSACT5 workshop, Qur'an QA 2022 Shared Task participation by TC
OFCOD: On the Fly Clustering Based Outlier Detection Framework
In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics
ANFIS-based PID continuous sliding mode controller for robot manipulators in joint space
This paper presents a feasible design for a con- trol algorithm to synthesize an adaptive neuro-fuzzy inference system-based PID continuous sliding mode control system (ANFIS- PIDCSMC) for adaptive trajectory tracking control of the rigid robot manipulators (RRMs) in the joint space. First, a PID sliding mode control algorithm with sliding surface dynamics-based continuous proportional-integral (PI) control action (PIDSMC-SSDCPI) is presented. The global stability conditions are formulated in terms of Lyapunov full quadratic form such that the robot system output can track the desired reference output. Second, to increase the control system robustness, the PI control action in the PIDSMC- SSDCPI controller is supplanted by an ANFIS control signal to provide a control approach that can be termed adaptive neuro-fuzzy inference system-based PID continuous sliding mode control system (ANFIS-PIDCSMC). For the proposed control algorithm, numerical simulations using the dynamic model of RRM with uncertainties and external disturbances show high quality and effectiveness of the adopted control approach in high-speed trajectory tracking control problems. The simulation results that are compared with the results, obtained for the traditional controllers (standalone PID and traditional sliding mode controller (TSMC)), illustrate the fact that the tracking control behavior of the robot system achieves acceptable tracking performance
Classification of Brain MRI Tumor Images Based on Deep Learning PGGAN Augmentation
The wide prevalence of brain tumors in all age groups necessitates having the ability to make an early and accurate identification of the tumor type and thus select the most appropriate treatment plans. The application of convolutional neural networks (CNNs) has helped radiologists to more accurately classify the type of brain tumor from magnetic resonance images (MRIs). The learning of CNN suffers from overfitting if a suboptimal number of MRIs are introduced to the system. Recognized as the current best solution to this problem, the augmentation method allows for the optimization of the learning stage and thus maximizes the overall efficiency. The main objective of this study is to examine the efficacy of a new approach to the classification of brain tumor MRIs through the use of a VGG19 features extractor coupled with one of three types of classifiers. A progressive growing generative adversarial network (PGGAN) augmentation model is used to produce ‘realistic’ MRIs of brain tumors and help overcome the shortage of images needed for deep learning. Results indicated the ability of our framework to classify gliomas, meningiomas, and pituitary tumors more accurately than in previous studies with an accuracy of 98.54%. Other performance metrics were also examined
A Reliable Event-Driven Strategy for Real-Time Multiple Object Tracking Using Static Cameras
Recently, because of
its importance in computer vision and
surveillance systems, object tracking has
progressed rapidly over the last two decades.
Researches on such systems still face several
theoretical and technical problems that badly
impact not only the accuracy of position
measurements but also the continuity of
tracking. In this paper, a novel strategy for
tracking multiple objects using static cameras
is introduced, which can be used to grant a
cheap, easy installation and robust tracking
system. The proposed tracking strategy is based
on scenes captured by a number of static video
cameras. Each camera is attached to a
workstation that analyzes its stream. All
workstations are connected directly to the
tracking server, which harmonizes the system,
collects the data, and creates the output
spatial-tempo database. Our contribution comes
in two issues. The first is to present a new
methodology for transforming the image
coordinates of an object to its real
coordinates. The second is to offer a flexible
event-based object tracking strategy. The
proposed tracking strategy has been tested over
a CAD of soccer game environment. Preliminary
experimental results show the robust performance
of the proposed tracking strategy
Deep-Risk: Deep Learning-Based Mortality Risk Predictive Models for COVID-19
The SARS-CoV-2 virus has proliferated around the world and caused panic to all people as it claimed many lives. Since COVID-19 is highly contagious and spreads quickly, an early diagnosis is essential. Identifying the COVID-19 patients’ mortality risk factors is essential for reducing this risk among infected individuals. For the timely examination of large datasets, new computing approaches must be created. Many machine learning (ML) techniques have been developed to predict the mortality risk factors and severity for COVID-19 patients. Contrary to expectations, deep learning approaches as well as ML algorithms have not been widely applied in predicting the mortality and severity from COVID-19. Furthermore, the accuracy achieved by ML algorithms is less than the anticipated values. In this work, three supervised deep learning predictive models are utilized to predict the mortality risk and severity for COVID-19 patients. The first one, which we refer to as CV-CNN, is built using a convolutional neural network (CNN); it is trained using a clinical dataset of 12,020 patients and is based on the 10-fold cross-validation (CV) approach for training and validation. The second predictive model, which we refer to as CV-LSTM + CNN, is developed by combining the long short-term memory (LSTM) approach with a CNN model. It is also trained using the clinical dataset based on the 10-fold CV approach for training and validation. The first two predictive models use the clinical dataset in its original CSV form. The last one, which we refer to as IMG-CNN, is a CNN model and is trained alternatively using the converted images of the clinical dataset, where each image corresponds to a data row from the original clinical dataset. The experimental results revealed that the IMG-CNN predictive model outperforms the other two with an average accuracy of 94.14%, a precision of 100%, a recall of 91.0%, a specificity of 100%, an F1-score of 95.3%, an AUC of 93.6%, and a loss of 0.22
QoS optimization for cloud service composition based on economic model
Cloud service composition is usually long term based and economically driven. Services in cloud computing can be categorized into two groups: Application services and Computing Services. Compositions in the application level are similar to the Web service compositions in Service-Oriented Computing. Compositions in the computing level are similar to the task matching and scheduling in grid computing. We consider cloud service composition from end users perspective. We propose Genetic Algorithm-based approach to model the cloud service composition problem. A comparison is given between the proposed composition approach and other existing algorithms such as Integer Linear Programming. The experiment results proved the efficiency of the proposed approach. Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2015.This work was made possible by NPRP grant # 7 - 481-1 - 088 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.Scopu